In the visualization step we had found three customers with more than 25 reviews. Thus we will delete these customers from the data.
We had cleaned and stemmed the reviews.
There are three documents which are Apple, BestBuy and Microsoft document.
The barplots below show the top 20 terms in each document based on tf-idf metric. The reviews of document Apple, contain terms AppleCare, iPhone, Macbook and Ipad, which means the customers use AppleCare warranty system for their products, especially for iPhone and MacBook. On the other hand, customers choose the BestBuy warranty system for many products and companies, especially for Television and sound system. The products of the companies Sony, Vizio and Insignia are the most popular. Finally, for Microsoft, we conclude that the customers use SmartGuard as the warranty system for their products (we do not have too many reviews to make any conclusion).
Top 20 words based on tf-idf
Wordcloud is another way to identify the most important terms of each document based on the tf-idf metric. The most frequent word in a document appears bigger and bold.
Instead of using the bigrams to identify the most frequent pair of terms, we will use collocation which is, in the most general sense, is just some number of words that tend to occur together often. And based on wikipedia collocation is a sequence of words or terms that co-occur more often than would be expected by chance.
Company Apple: As we can see the customers talk about many subjects, for example they talk about the appleCare price, replacment, protection …
Company BestBuy: In general the customers found the bestBuy warranty as the best
Company Microsoft: Given the number of reviews is very low the Network graph does not show any results except that the customers use SmartGuard warranty system for Microsoft products.
Based on the barplots below, the proportion of the reviews with positive sentiments is significantly higher than the proportion of the reviews with negative sentiments in the case of the documents Apple and BestBuy. But in the case of Microsoft, the proportion of the positive and negative reviews are approximately similar.
Globally the average score of the sentiment of Microsoft document is negative with high variability, which means there’s a huge variation of reviews and the customers are not happy about the warranty system on average. On the other hand, the average score of sentiment of Apple and BestBuy documents are postive with low variability.
The wordclouds of the top common positive and negative. In our case we present the top frequent positive and negative terms of each document.
– Company Apple
– Comapny BestBuy
– Company Microsoft
Based on the barplots below the proportions of the negative emotions (anger, disgust, fear and sadness) are 36.9%, 13% and 34.4% in Microsoft, BestBuy and Apple documents respectively, which means the customers are emotionally more comfortable with BestBuy warranty system than the others (this difference is statistically significant).
To Analyse the importance of warranty products per each company, we will use the some keywords.
The dot plots below shows the relative frequencies of each keyword by document. We conclude that the customers found that the warranty system of Microsoft is expensive, cheap and bad. For Apple, the customers found that the warranty system is expensive and worth it. Finally, for BestBuy ,the customers are happy with the warranty system.
Instead of looking at the association between a single keyword and a document, we will look at the association between a document and the keyword and their synonyms. For example the synonyms of the keyword LEAVE are abandon, depart, disappear….
From the dot points below we conclude that the customers are more happy with BestBuy warranty system than the others especially Microsoft warranty system.
To identify the keyword context within the source text (original reviews) we will use Locate Keywords-in-Context algorithm.
Company Apple
Company BestBuy
Company Microsoft